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Abstract 

We show here that every non-adaptive property testing algorithm making a constant 
number of queries, over a fixed alphabet, can be converted to a sample-based (as per [Gol- 
dreich and Ron, 2015]) testing algorithm whose average number of queries is a fixed, smaller 
than 1, power of n. Since the query distribution of the sample-based algorithm is not de¬ 
pendent at all on the property, or the original algorithm, this has many implications in 
scenarios where there are many properties that need to be tested for concurrently, such as 
testing (relatively large) unions of properties, or converting a Merlin-Arthur Proximity proof 
(as per [Gur and Rothblum, 2013]) to a proper testing algorithm. 

The proof method involves preparing the original testing algorithm for a combinatorial 
analysis, which in turn involves a new result about the existence of combinatorial structures 
(essentially generalized sunflowers) that allow the sample-based tester to replace the original 
constant query complexity tester. 


1 Introduction 

A test for a property L C H"" (where H is a fixed alphabet), with proximity parameter e, is an 
algorithm that queries an input u) G H"" in a limited number of places, and distinguishes with 
high probability between the case that w € L and the case that no w' € L is e-close to w in the 
normalized Hamming distance. A non-adaptive test is a test that decides its queries in advance 
of receiving the corresponding input values, which basically means that its queries are governed 
by a single distribution jjL over the power set of [n]. 

Given a family of properties A” C {L : L C H""}, we say that there is a canonical testing 
scheme for T if there are non-adaptive tests (with the same parameter e) for all members L ^ J^, 
which additionally all share the same query probability distribution fa. 

This concept has been defined and used before. The most well-known example is that of 
[lOj . where the family of all properties of dense graphs (as per the model defined in m) with 
n vertices that are testable (non-adaptively or not) with up to q queries, is shown to have a 
canonical testing scheme, where the common query distribution consists of uniformly picking a 
set of q vertices and querying all (|) vertex pairs. 

Note that being in the dense graph model in essence restricts the admissible properties. 
Under this model, an input w G {0, 1 }( 2 ) is interpreted as the adjacency matrix of a graph 
with n vertices, and a property L is admissible if it is invariant under all input transformations 

‘Faculty of Computer Science, Israel Institute of Technology (Technion), Haifa, Israel, eldarlics.technion.ac.il 

^Birkbeck, University of London, London, UK. oded@dcs.bbk.ac.uk 

Haculty of Computer Science, Israel Institute of Technology, Haifa, Israel, yaduvasudev@gmail.com 


1 



corresponding to re-labeling the graph vertices (i.e., the transformations corresponding to graph 
isomorphisms). 

There are other examples. For example, given a finite field F, for properties of fnnctions over 
a linear space over F that are known to be invariant nnder linear transformations, a canonical 
testing scheme would consist of querying the function over an entire small dimensional subspace 
picked uniformly at random [2]. 

A natural question is what would be a candidate for an “ultimate” canonical scheme, where 
there are no structural impositions on the property at all. One would expect here a query 
distribution that is completely symmetric with respect to any permutation of the index set [n]. 
Indeed, such a scheme is defined as sample-based testing in [9]. The sampled-based distribution 
fip corresponds to choosing every index i G [n] to be queried independently with probability p. 
Usually p will be for some 1 > a > 0. It is a folly to expect a sample-based testing scheme 
with significantly fewer queries, even for properties with a constant bound on the number of 
queries for a test, as evidenced already in [71 Proposition 6.9]. 

In |9] a connection between proximity-oblivious testers (POT) as defined in |8] and the 
sample-based querying scheme was suggested. Proximity oblivious testers are non-adaptive 
testing algorithms whose querying distribution is the same for any proximity parameter e, where 
instead the distinguishing probability between inputs in L and inputs e-far from L changes with 
e. The work |9] showed that for such testers that additionally have the property that all indexes 
get queried with about the same probability (but not necessarily in an independent manner), 
there exists a conversion to sample based testers with p = 0(n“^/'^), where the coefficient 
depends on the distinguishing probability, and the parameter measuring the above-mentioned 
“probability sameness” of the original test. 

In [6] it is shown that all 1-sided proximity-oblivious testers over the alphabet {0,1} are 
convertible to the canonical sample-based scheme, where p = with a depending (somewhat 
badly) on g, e and the distinguishing probability b. In [9] there is an example of a testable 
property that has no sublinear query complexity sample-based test at all, but it works only 
over an alphabet whose size is exponential in n, and so does not contradict the result of |6]. 

Here we take the investigation much further, and prove the following. 

Theorem 1.1 (informal statement of our main result). Every property of words in EF that 
has a non-adaptive e-test with q queries and deteetion probability 6 (either 1-sided or 2-sided) 
admits a test using the sample-based canonieal querying scheme, where the distribution pLp has 
p = 0{n~°‘), with a depending on q, 6, and for 2-sided testing also on |H| and e, and the hidden 
eoeffieient depending on q, S, e and |H|. 

We prove this separately for 1-sided tests and 2-sided tests. For 2-sided tests we go further 
and prove the result for partial tests, that are only guaranteed to accept inputs in some sub¬ 
property L' with high probability, a generalization whose relevance is explained below. 

For both 1-sided testing and 2-sided (possibly partial) testing we obtain a very improved 
bound on a as compared to the 1-sided testing result of |B]. Additionally, the dependency of 
the coefficient on |H| is logarithmic, while for the 2-sided test the additional dependency of a 
on |H| is of type logloglog(|H|). This shows that the exponential size of the alphabet in the 
counter-example in [9] is essential. 

We believe that the “correct” a should be just —Ijq, at least for converting 1-sided tests, 
but cannot prove it yet. 

1.1 Implications for multitests 

There are several motivations for finding canonical testing schemes. One of them is for proving 
lower bounds, which may be easier when the querying distribution is “simple” and known. 
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Here we would like to highlight another one, which also played an implicit role in the original 
motivation of [6]. 

Given a sequence of properties Li,..., L^., a multitest for them is an algorithm that makes 
queries to a word re € H”, and provides a sequence of answers. With probability at least 1 — (5, 
the answers should be correct for all the properties, that is, for every k such that rc € L^. 
the corresponding answer should be “yss”, and for every I such that w is e-far from the 
corresponding answer should be “no”. 

If we know nothing else about the properties apart from that they are all testable using q 
queries each, then the scalability of a test to a multitest would be quasilinear; We first take a 
test for every Lj, and amplify its success probability to 1 — 6/r (which multiplies the number 
of queries by O(logr)). Then we just run these r tests one after the other, and use the union 
bound for the total success probability, all in all using 0{q ■ rlogr) queries. This is not always 
good enough, as in some applications r can depend on n, and may even be greater than n (say 
through a polynomial dependency). 

However, the situation changes dramatically if we know all properties to share a canonical 
testing scheme with q' queries (where q' could depend on n). In this case, we can re-use the 
same queries for all r (amplified) tests, and the union bound will still work. This brings us to 
using only 0{q' ■ logr) queries in all. This scalability can have many implications. 

In [6], multitests are implicitly used for testing unions of properties. This in turn allows to 
convert in certain cases tests requiring proofs as per the AiAV scenario (defined in [11] and 
also developed in |6|) to tests that still have a sublinear query complexity but do not require 
such proofs. In this setting we deploy the generalization of our result to partial testing, as a 
AiAV scenario converts to a union of partial testing problems. 

Another scenario aided by a multitest is if one wants to store the results for w belonging 
(approximately) to a rather large set of possible properties. If the properties share a canonical 
testing scheme, and the corresponding property tests also admit a not too large computation 
time overhead, then it may be worthwhile to store instead the common set of queries performed 
by the multitest, because this query set increases rather slowly with r. 

Finally, a canonical testing scheme also allows for some measure of privacy: Suppose that 
one wants to test a particular property of tc € H”, but wants to hide from the “input holder” 
the identity of the particular property to be tested. By using the canonical scheme, no one but 
the party performing the test can discern which of the properties having the canonical scheme 
is being tested for. 

1.2 Methods used 

The crucial analysis used for converting a test with q queries to a sample-based test is of a 
combinatorial nature. We take the support of the query distribution of a non-adaptive test, and 
analyze it as a family of query sets, essentially a g-uniform hypergraph whose vertex set is the 
domain of possible queries. 

For 1-sided testing algorithms, since they reduce to checking whether the set of queries is 
a witness refuting the possibility of the input belonging to the property, the support of the 
distribution provides most of the information we need. We can assume (through a simple 
processing of the original test) that the number of possible query sets is linear in the domain 
size n. Finding large “matchings” (families of disjoint sets) of refuting witnesses would be ideal 
for sample based testing, but since we make no assumptions on our family of sets outside its 
size, these do not always exist. 

The next option that could be explored is finding large sunflowers as defined in [2]. This is 
the approach taken by [6j, and it can be generalized to the setting here. However, the obtained 
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a value for the n““ sampling test would depend very badly on the other parameters, because 
sunflowers require processing in several stages. 

Here we present a generalization of sunflowers, which we call pompoms. The main difference 
is that the “core” common to the participating sets is not their intersection as in sunflowers, and 
in fact could be much larger - the only requirement is that the participating sets are disjoint 
outside the core. The support of the query distribution is shown to admit pompoms larger than 
the sunflowers it would admit, and moreover ones that all share the same core, so we can rule 
out the possible inputs in just one processing round. 

For 2-sided testing we use pompoms to explicitly estimate what some portions of the original 
test would say for a particular input. Because of the estimation requirements, we need an 
additional requirement of the 2 -sided test - it has to be combinatorial, in the sense that its 
query distribution is uniform over the family of possible query sets. Much work is needed to 
fully convert a general 2 -sided test to a combinatorial one that can be analyzed as a hypergraph, 
and this introduces some extra dependency on the alphabet size. To aid with the analysis of 
the 2-sided tests, a formalism of probabilistic formulas is introduced. The combinatorialization 
shown here has potential for future uses, as it generalizes to promise problems apart from 
property testing - it is enough to have at least one “yes” instance and one “robust no” instance. 

The pompom families used both in the 1-sided and the 2-sided sample-based testing result 
both stem from a general structural result about the query distributions, that finds a constel¬ 
lation in the underlying support. This in turn readily allows for the extraction of pompoms. 


2 Preliminaries 


2.1 Large deviation bounds 

The following is useful for the analysis of sample based testing. 


Lemma 2.1 (multiplicative Chernoff bounds). Let Xi,..., Xm be i.i.d 0-1 random variables 
such that Pr[Xj = 1]= p. Let X = Xi. For any 7 € (0,1], 

Pr [X > (1 -b 7 )pm] < exp (— 7 ^pm/ 3 ) 

Pr [X < (1 — 7 )pm] < exp (— 7 ^pm/ 2 ) 

Lemma 2.2 (Hoeffding bounds, [T2|). Let Fi,--- ,Tm independent random variables sueh 
that 0 < Yi < 1, for i = 1, - ■ ■ ,m and let p = 'E Yi/m]. Then, 


Pr 


E m 

i=l 


m 



> t 


< 2 exp(— 2 mf^) 


Lemma 2.3 (without replacement, [12]). Let Xi,... ,Xm random variables picked uniformly 
without repetition from the sequence C = ( 71 ,..., 7 ^) where 0 < 7 ^ < 1 (this means that 
ii,...,im are picked uniformly without repetition from [m], and then every Xj is set to 'fi-; 
it may be that some 7 ^ is equal to another). Let Fi,--- , F^ he independent random variables 
pieked with repetition from C (i.e. every kj is uniformly and independently chosen from [rn\ 
and then Yj is set to 'Jkj)- Then the eonelusion of Lemma \2.^ for Yi,...,Ym holds also for 
Xi,..., Xm, that is, 

< 2exp(—2mt^) 

where 7 = ^YT=iY- 


Pr 



> t 

m 
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Lemma 2.4 (large deviation bound). Denote by Hp the distribution over subsets of[m], where 
every i G [m] is picked into the subset with probability exactly p, independently from all other 
j i. Suppose that 71, , 7^ are all values in [0,1], and let U C [m] be chosen according to pLp, 
where p > It^c/rfm and c > 1. Then, with probability at least 1 — e~'^, the value l^)/\U\ 

(where we arbitrarily set it to ^ if U = 9) is in the range {YllLi 7 i)/^ rj. 

Proof. If U is picked according to //p, then E[|[/|] = pm. By the multiplicative Chernoff bound 
in Lemma O we have the following bound on the probability of the size of U being small: 


Pr 


\U\ < < exp(—pm/8) < e ‘^/2. 


We continue our analysis conditioned on the event that the size of U is at least pm(2. For ev¬ 
ery k > pm/2, let us analyze separately the deviation of the value 7i) /\U\ conditioned on 

\U\ = k. Lemma [ 23 ] holds for this case, stating that the probability of ^ “ m 7* 

being greater than rj is bounded by e~^/2. Hence, for U picked according to /ip, the probability 
of 7i/|C/| being outside the range ± ?/ is at most e by the union bound on 

the event of \U\ < pm/2, and the event of | Yli=i ~ m li > V while \U\ > pm/2. □ 


2.2 Words and distributions 

Notation for words Let H be an alphabet, and let w G H”', i G [n] and Q C [n]. We use Wi 
to denote the i’th letter of w and wq to denote the word v G such that, for every j G [|(5|]) 
Vj = WQ(j), where Q{j) is the j’th smallest member of Q. Let C C [n] and u be a word in 
We denote by Wa,c the word that we get by taking w and replacing its sub-word wc with a. 

Definition 2.5 (word distances). Two words w,v G H"" are said to be e-far if there is no A 
of size at most en for which (in other words, we use the normalized Hamming 

distance). Otherwise these words are said to he e-close. Given a property L C rP, a word w is 
said to be e-close to L if there exists an e-close word v which is in L, and otherwise w is said 
to be e-far from L. 


Notation for distributions We deal with distributions /x over subsets of [re]. For A C [re] 
we denote by p.{A) the probability of A being drawn by /x. For a non-empty event, that is a 
family of sets 0 / C 2[”1, we abuse notation somewhat and denote /x(7l) = YIagA We 

denote by Supp(/x) the family of positive probability outcomes {A C [re] : /x(H) > 0}, and for 
two distributions /x and /x' denote by dist(/x, /x') the variation distance ^ Yl,Ac.[n] ~h'{^)\ = 
max_4c2W \lJ‘{-A) - IJ^'{A)\. 

2.3 Property Testing 

We start this subsection by defining tests. We define partial tests (of which tests are a special 
case), because we would like our main result to also have applications in the realm of AiAVs 
as defined in [Ill- 

Definition 2.6 ((e, d, q)-test). Given two properties L' C L C a partial (e, J, g)-test for 
{L',L) is a randomized algorithm A that, given query access to the input w, uses q queries and 
satisfies the following: 

1. If w G L', then Pr [A{w) = 1] > 1 — <5. 

2. If w is e-far from L, then Pr [.4.(tc) = 0] > 1 — d. 
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The test is 1-sided if, when w ^ L', the output is always 1, and otherwise it is 2-sided. If the 
choice of every query is independent of the answers to the previous queries, then the test is 
non-adaptive, and otherwise it is adaptive. 

In the case where L' = L, we call it a (e, 5, (jr)-test for L. 

We remark that, in the case of a non-adaptive test, we may assume that the set of queries is 
selected before any query is made. So, a non-adaptive test can be viewed as consisting of three 
steps: (i) a set of queries Q is randomly selected according to a distribution over (ii) the 
sub-word wq is queried; (iii) the output is computed according to {Q,wq). 

We note that a 1-sided test can reject only if {Q,wq) constitutes a proof that w is not in 
the property. This occurs if and only if Q is a witness against w as defined next. 

Definition 2.7 (witness against a word). A set Q C [n] is a witness against a word tc € H"’ 
(with regards to a property L), if every u such that uq = wq is not in L. 

Without loss of generality, we assume that a test always rejects when it encounters a witness 
against the word. In the case of a 1-sided tester it is actually the case that the test rejects only 
if it encountered a witness. We next formally define the concept of the distribution of a non- 
adaptive test. 

Definition 2.8 (distribution of a non-adaptive (e, (5)-test). The distribution of a non-adaptive 
{e,6)-test A, denoted by is a distribution over such that, for every Q C [n], the value 
of hA{Q) 'Is the probability A will select Q to be its set of queries. We omit the subscript when 
it is clear from context. 

Our conversion results rely on the combinatorial aspects of distributions of tests. In fact, 
for non-adaptive 1-sided tests without loss of generality this distribution is the sole defining 
object, because the test can be assumed to reject if and only if its query set produced a witness 
against the input word. In particular, we show a reduction to the case where the cardinality 
of the support of the distribution has a bound linear in n. We use the following definition to 
capture this case and afterwards we give the reduction. 

Definition 2.9 (non-adaptive {€,6,q,k)-test). A non-adaptive {e,5,q)-test is a {e,5,q,k)-test, 
if I Supp(/r)| < k. 

We observe that the support of the distribution of an (e, 5, q)-test contains only sets of 
cardinality q. We use the term (e, 5)-test (omitting q) when we do not make any assumption on 
the cardinality of the sets in the distribution. The following lemma transforms a 1-sided test to 
one with parameters more suitable for analysis and conversion to sample-based testing. 

Lemma 2.10. A non-adaptive 1-sided {e/2,5,q)-test can he converted to a non-adaptive 1-sided 
{e/2, l/2{q' -\- 1), q',4:{q' + 1)^ log{\E\)n)-test where q' = 0{qlog{q)/{1 - J)). 

Proof. First, by traditional amplification, repeating the original test 101og(g')/(l — 5) times 
and rejecting if any run had rejected, we convert it to an (e/2,1/lOOOg", g^j-test where q" = 
0{qlog{q)/{1 — 6)). Then we consider the outcome of running the test 10log(|H|)g"n times 
independently. By Lemma l2.ll for any fixed e/2-far input w G E^, the probability that it is 
accepted by more than a 1/lOg'^^ fraction of the runs is bounded by -logd^Dn/soo ^ 

This means that with probability at least such a sequence of runs will satisfy the above for 
all e/2-far inputs at once. We fix such a sequence of runs, and make it the new test. That is, the 
new //' consists of selecting one of the fixed runs uniformly at random, and using its query set. 
This brings us to an (e/2, 1/lOq", q", 10 log(|H|)g"n)-test. We artificially increase the number of 
queries to q' = 3q" to obtain our required test. □ 
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The following two lemmas are essential for the analysis of 1-sided test. When reading them, 
one should have in mind that, when they are applied, the 5 parameter in their statement is 
small since the tests analyzed are those implied by Lemma I2.1UI 

Lemma 2.11. Let J C Supp(/i), where /r is the distribution of a 1-sided {e/2,6)-test for a 
non-empty property L for which e-far words exist. If \ UqgJ' Q\ < e^/2, then pi{J) < 6. 

Proof. Let T = Uq^j' Q, m € be a word in L, rc G rP be e-far from L, and v € he 

such that vt = ut and Assume that | ^ ^n/2. Then, by the triangle 

inequality, v is e/2-far from L. 

Considering a 1-sided test of v with distribution p., we first note that no member of is a 
witness against v. Thus, /i(^) is at most 1 minus the probability of // obtaining a witness. As 
V is e/2-far from L, the probability of obtaining a witness by n is at least 1 — 5, implying that 
yi{J) <5. □ 

Lemma 2.12. For p, which is the distribution of a 1-sided (e/2, 5)-test for a non-empty property 
L for which e-far words exist, let w be he/h-far from L and J C Supp(/r). If p{J') > 26, 

then the set S of all Q £ H which are witnesses against w satisfies \ UQe<s Q\ — f^/2- 

Proof. Let S F fl he the subset of witnesses against w as in the formulation of the lemma. 
Since w is 5e/6-far from L, the distribution p provides a witness against w with probability at 
least 1 — 5, and therefore p{S) > 6. Consequently, by Lemma [2.Ill | y]Q^jQ\ > en/2. □ 

Definition 2.13 (p-sampling (e, 5)-test). A p-sampling test for a property L is an {e,6)-test 
such that every i £ [n] is selected as a query, independently, with probability p; in other words, 
it is a sample-based test with probability p as defined in m- A p-sampling test is 1-sided if 
every word in the property is accepted with probability 1 and otherwise it is 2-sided. We use the 
notation pp to denote the distribution of the p-sampling test. 


3 A conversion of a 1-sided test to a 1-sided sampling test 


We show here that if a property is testable with a 1-sided error, then it has p-sampling 1-sided 
(e, 5)-test with p corresponding to some negative power of n. Specifically, we prove the following 
theorem, which as we explain immediately afterwards implies our claimed result. 


Theorem 3.1. For every n > {2‘iq{q l)^(log(|H|))^/e)'^, if a property over has a 1-sided 

(e/2, 1 / 2((7 -|- 1),(7,4((7 -|- 1)^ logyH|)n)-tesL then it also has a p-sampling 1-sided {e,l/2)-test 
such that p = 0{\og{\E\)q^n~^/^ /e). 


The preceding theorem is effective for all properties with 1-sided (e/2,5, g')-tests, since, by 
Lemma ETOI (e/2,5, g)-tests can be converted to a (e/2, l/2(g' -|- 1), g, 4(g -|- 1)^ log(|S|)n)-test, 
where q' is bounded by a polynomial in q and 1/(1 — 5). 

We next sketch a proof that the statement of Theorem 13.II holds, for every test that satisfies 
the additional constraint that it has a distribution p such that Supp(p) consists of pairwise 
disjoint sets. The main result of this section can be interpreted as a reduction to this simple 
case. 

Suppose that p is a distribution of a 1-sided (e/2, l/2{q l),q, 4(g -|- 1)^ log(|H|)n)-test for 

L. Let w be e-far from a property L, and B be the family of all the sets in Supp(p) that are 
witnesses against w. Now note that if \B\ is sufficiently large, then using the fact that these sets 
are pairwise disjoint it is easy to show that, with probability at least 1/2, the set of queries used 
by a p-sampling test contains at least one of these sets. This in turn implies that a p-sampling 
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test reject w with probability at least 1/2. We next explain why \B\ is indeed sufficiently large. 
Let B be the union of all the sets in B. By definition, the test rejects w with probability at least 
1/2, and therefore ^(B) > 1/2. Thus, by Lemma 12. Ill \B\ > en/2 and hence \B\ > en/{2q). 

When the sets in Supp(/x) are not pairwise disjoint the preceding idea does not work, for 
example, in the case that the size of the intersection of all the sets in Supp(/x) is exactly of 
size 1. Here, with high probability, a set of queries selected at random according to does 
not contain a set from Supp(/i) that is a witness against w. Thus, we can’t conclude that a 
p-sampling test rejects w with the required probability. We now explain how to circumvent this 
barrier in two steps: in the first step we assume that C, the intersection of sets in Supp(;u), 
is significantly smaller then en/2 and that we are given wc in advance; in the second step we 
show what to do when wc is not known in advance. 

Let B and B be as defined in the simple case. In the same manner as the simple case, we 
can conclude that \B\ > en/2. Let A4 be the family of all non-empty sets Q \ C such that 
Q G Supp(/r). We note that the size of Ai is 0{enfq) since, by construction, the size of the 
union of these set is |il| — jCI = 0{enfq). It is thus easy to show that, with high probability, 
a set of queries selected at random according to fip together with C contains a witness against 
w, and hence a p-sampling test, with the advance knowledge of wc, rejects w with the required 
probability. The combinatorial structure consisting of the set C and the family B is captured 
by the following definition: 

Definition 3.2. (i-pompom) A family of sets S is an i-pompom if there exists a set C, which 
we refer to as the core of the i-pompom, such that the following hold. 

1. IQ \ C*! = i for every Q G 5. 

2. Q\C and Q' \C are pairwise disjoint for every distinct Q and Q' in S. 

The restriction of the cardinality of the sets Q \ C is required to support technical com¬ 
putations in the proofs. We next explain how the above idea works when wc is not given in 
advance. 

If, given a word w, we have a large pompom with core C that is additionally made up of 
witnesses against for some v G then similarly to the simple case described above, the 
sampling distribution will produce a set showing that w is not in the property unless wc ^ v. 
This simple observation is the motivation for the following setting. Suppose that there existed a 
set C and a set of families {5(j}a-g=|ci such that for every a G is an i-pompom consisting 

of witnesses against r^rid has C as a core. Now, if the cardinality of C is sufficiently small 
and the cardinality of every i-pompom is sufficiently large, then we can prove the following: 
with high probability, a set of queries selected at random according to /r^, contains a set of 
queries whose values rule out any possible value of wc, and hence imply that w is not in the 
property. We refer to such a set of queries as a super-witness. 

Definition 3.3 (super-witness against a word). We say that X C [n] is a super-witness against 
a word w G H"", if there exists a set T C [n] \X such that, for every a G there exists a set 
Q X UY which is a witness against w^y. 

Recall that the set X in the above definition does not necessarily contain any set from 
Supp(/i). However, as we prove next, it is sufficient to imply that w in not in the property. 

Observation 3.4. Any set containing a witness against a word w is also a witness against it. 
Additionally, a set is a super-witness against w if and only if it is a witness against it. 
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Proof. The part about containing sets follows immediately from the definition. Additionally, a 
witness X is also a super-witness by setting T = 0. Now let X C [n] be a super-witness against 
u) € H”. By the definition of a super-witness, for every u € such that ux = wx, there exists 
a witness against u (some subset of A U T) and hence u ^ L. Thus A is a witness against w 
with regards to L. □ 

We now formally define the type of set of i-pompoms that we need for our result. 

Definition 3.5. (revealing set of i-pompoms for w) Let w be any word in H”’. A set of i-pompoms 
forw is revealing if there exists C C [n] of cardinality bounded above by 4q{q+l)^ , 

such that, for every a G the set contains an i-pompom that satisfies: 


1. C is a core of S^. 

2. for every a G consists only of witnesses against Wa,c- 

3. |5^| > (l/3i)e 


The bounds on the cardinality of the core and the i-pompoms, follow from the combinatorial 
construction we use in order to prove their existence. We show next that they are sufficient for 
our purposes and afterwards present the combinatorial construction. 


Lemma 3.6. Let n > (24g((7-|-l)^(log(|H|))^/e)'^, a = 15 In \r.\-q{q + \)‘^/e and w G H". If there 
exists a revealing set of i-pompoms for w with a core C, then the a ■ -sampling algorithm 

rejects w, with probability at least 


Proof. Let be the set {Q \C : Q ^ 5o-}. We observe that \Ba\ = |5ct|. Let R C [n] be the 
set of indexes sampled by the a ■ -sampling algorithm. 

Referring to Calculation [T] (see appendix), for every a G Sl^l, the probability that R does 
not contain a set from B^ is at most (1 —< 
Thus, by the union bound, with probability at least for every a G the set R 
contains a set from B^. We note that this means that R is a super-witness against w, which by 
Observation makes it a witness against w. Thus the a ■ n -sampling algorithm rejects 


w with probability at least | and hence the statement of the lemma follows. 


□ 


We now show that, for every property L C rP, that has a 1-sided (e/2, l/2{q + l),q,4{q + 
1)^ log(|H|)n)-test, there exists a revealing set of i-pompoms for w. We first show that if Supp(/x) 
has a family of sets S that is almost an i-pompom, then for every w that is e-far from L, S 
admits a revealing set of i-pompoms. By “almost” we mean that there exists a set C, which 
satisfies that every j G (Uqg<s Q) \ C* is not in too many of sets in S (where for a true pompom 
every such j would be in exactly one set). 

First, let us formally dehne the “almost-pompom” sets; this definition will also serve in the 
proof of the conversion for 2-sided tests. 


Definition 3.7 (constellation). Fori G [g], n, a distribution fi over subsets of[n] of size q, and 
any positive number r], an (?/, i)-constellation is a pair {C,S) consisting of a set C C [n] and a 
family S C Supp(/i) satisfying the following. 


1. \C\ < ryn^-*/?. 

h{S) > 

3. \Q r\C\ = q — i, for every Q £ S. 
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4- Every j G Uqg<s Q\^ is in at most sets from S if i > 1. 

Lemma 3.8. Let i G [g], n > {24^q{q + l)^(log(|H|))^/e)'?, w be any word in that is e-far 
from L, and yi be the distribution of a 1-sided (e/2, l/2{q + l),q, 4(g + 1)^ log(|H|)n)-iesi for L. 
If there exists a (4g'(g + 1)“^ log{\E\),i)-constellation for yi, then there exists a revealing set of 
i-pompoms for w. 

Proof. Let a be any arbitrary word in Since n > {2Aq{q + l)^(log(|H|))^/e)'?, ICI < en/6, 
and hence by the triangle inequality rCo-,c is 5e/6-far from L. 

Let iSo- C 5 be the set of all Q € <S which are witnesses against Wcr,c- Since ^(<S) > by 
Lemma [2. 121 we have | UQG<Sa Let be the set {Q \ C : Q G 5a-}. We observe that 

I [JqgB. Q\ ^ I Uges,. Q\ “ \C\ > en/3, because ICI < en/6. 

Suppose first that i = 1. We let 5o- C 5 be maximal so that, for every Q G S^, Q \ C is 
distinct and a member of Clearly, |5a| > (l/3)e • n, is an 1-pompom, and C is a core of 
5a. 

Suppose now that i > 1. Then, there exists B'^ C B^ such that every pair of sets in B'^ is 
disjoint and \B'^\ > ^e • because every j G UgeBa Q \ C* C IJgg^ Q \ C is in at most 

jT,!*-!)/? gets from 5. We let 5a C 5 be maximal so that, for every Q G S^, Q \ C is a distinct 
member of Ba-- Clearly, |5a| > (l/3z)e • 5a is an z-pompom, and C is a core of 5a. 

We observe that the above implies that for every a G C is a core of 5a, 5a consists 
only of witnesses against Wa-,Cj and |5a| > (l/3i)e • n^ L !)/<?, Thus, by definition, {5a}agsic'l 
is an revealing set of i-pompoms for w. □ 

We now prove that any distribution yi, over subsets of [n] of size q, which satisfies Supp(^) < 
pn, admits a (r/, i)-constellation (5, C) for some i G [g], as long as a certain subset of Supp(/i) (de¬ 
fined below) is not too heavy. This together with Lemma 13.61 and Lemma 13.81 substituting r] = 
4q{q-\-l)‘^ log(|H|) and referring to the distribution /i of the (e/2, l/2{q + l),q, 4(g-|-l)^ log(|H|)n)- 
test, is sufficient for the proof of Theorem 13.11 

We start by defining three sets of families, {Ci}j^Q and {5j}?^g, where is 

a partition of Supp(/x). We prove afterwards that, as long as /r(5o) < for some i G [g] the 
sets Aii and Ci respectively compose the claimed constellation (5, C). 

Definition 3.9 (Aii, Ci and 5^). Given a distribution /i over subsets of [n] of size q whose 
support is bounded by pn, we inductively define A4i, Ci and Si as follows. 

1. Let Aio = Supp(/i), and Cq be the set of indexes j G [n] such that j is a member of at 
least sets in Mq. 

2. For i = 0,1,... ,g, after Mi and Ci are defined, let Si be all the sets Q G Mi such that 
\Q nCi\ = q - i. 

3. For i = 1,... ,q, after Mi is defined, let Ci be the set of indexes j G [n] such that j is a 
member of at least sets in Mi. 

4 . For i = 1,... ,q, after Mi-i and Si-i are defined, let Mi = Mi-i \ Si-i. 

The following statements give the properties of these sets. 

Observation 3.10. The following hold for the sets 0 f Definition 1 3. tH when they are constructed 
from a distribution /i satisfying the conditions there. 

1. |Co| < r]qn^~^/'i. 

2. \Ci\ < riqn^~^/^ for all 1 < i < q. 
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3. A4i C A4i-i for all 1 < i < q. 

4 - C-i C Ci-i for all 1 < i < q. 

5. Si n = 0 for all 1 < i < j < q. 

Proof. Item [3] follows immediately from the construction, and implies Item 0] in turn. Item [1] 
follows from the definition along with the assumption on the support size of /U, and so does Item 
[2] using Item[3l For Item [5] assume that i < j and note the construction of Adj+i, which makes 
it disjoint from Si while containing Sj. □ 

The goal of the following two lemmas is to prove that, for every i G [(;] and Q & Si, we 
have that (Q \ Ci) fl Cj_i = 0. According to Definition 13.91 this implies that Afj and Si satisfy 
Condition 0] of Lemma 13.81 At a high level of abstraction the proof starts with the assumption 
for some i G [g] there exists Q G Si such that {Q \ Ci) n Ci-i ^ 0; afterwards it is shown that 
this Q is in Mj , for some j G [f — 1]; this by Definition 13.91 implies that Q ^ Si in contradiction 
to the assumption that Q G Si. The following lemma is used in order to restrict the setting to 
that where j = i — 1. 

Lemma 3.11. For f = 0,1,... , g and every Q G Mi we have that \Q r\Ci\ < q — i. 

Proof. By definition, jQI < q for every Q G Mq. Hence, \Q (ICqI < q — 0 = q. We proceed by 
induction over i. Assume that the statement of the lemma holds for z — 1 > 0. Suppose for the 
sake of contradiction that there exists Q G Mi such that \Qr\Ci\ > q—i. Since Ci C Ci-i, by Item 
Hlof Observation 13.101 this implies that \Q nCi_i| > g — (i — 1). Hence, \Q nCi_i| = g — (i — 1), 
by the induction assumption. Therefore, Q G Si-i, because, by construction, we also have that 
Q G Mi- 1 . Consequently, we get the contradiction that Q ^ Mi, since Mi = Mi-i \ Si-i. □ 

Lemma 3.12. For i = 1,..., q and every Q G Si we have that {Q \ Ci) fl Cj_i = 0. 

Proof. We proceed by induction over i. The base case is i = 0 which follows from the definition 
of <So, even if we set C_i = [n]. Assume that the statement of the lemma holds for i — 1 . Suppose 
for the sake of contradiction that there exists Q G Si such that \{Q \ Ci) fl Cj_i| > 0. Thus, 
IQ nCj_i| = |(Q \Ci) nCj-il + \QriCi\ > q — (i — l), because Ci C Ci-i, by Item [3] of Observation 
13.101 Therefore, by Lemma [3.111 |Q fl Ci-i\ = g — (z — 1). Since by construction we also have 
that Q G Mi-i we deduce that Q G 5i_i. Consequently, we get the contradiction that Q 0 Si, 
since Si C Mi = Mi-i \ □ 

These last lemmas imply that if ^(5o) is not large, then a constellation exists. 

Lemma 3.13. ///z(5o) < then for some i G [g] the pair {Si,Ci) is an {riq,i)-constellation 
for la. 

Proof. By the assumption fJ,{So) < l/(g + 1), averaging and Item [5] of Observation 13.101 there 
exists z G [g] such that fJ,{Si) > l/(g + 1). Consequently, following from Observation 13.101 Items 
[M the choice of z. Definition 13.91 Item [2l and Lemma 13.121 together with Definition 13.91 Items 
[TI51 in that order, Ci and Si satisfy the four conditions of Definition 13.71 and thus form an 
{r]q, z)-constellation. □ 

We now prove the 1-sided test conversion result. 
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Proof of Theorem \&.l[ Given a distribution fj. which corresponds to a 1-sided {e/2,l/2{q + 
l),q,4:{q + 1)^ log(|H|)n)-test for a property over H”, where n > (24g(g -|- l)^(log(|H|))^/e)'^, 
we show for a = 15 In |H| • q{q + l)^/e that the a ■ -sampling distribution corresponds to 

a 1 -sided (e, l/ 2 )-test for the same property. 

From definition l3.9l every set in 5o is in particular a subset of Cq. Hence, the union of all the 
sets in <So is also a subset of Cq and therefore | UgeSo — l^ol- Thus, by Item [T] of Observation 
Ml lUge^o'SI < |Co| < 4g(g + l) 2 log(|H|)ni-i/'' < en/ 2 , where the last inequality follows 
from n > {24:q{q -|- l)^(log(|H|))^/e)'?. Therefore, by Lemma [2.111 /i(<So) < !/((? + 1), and so by 
Lemma 13.131 there exists an ( 4 g ((7 -|- 1)^ log(|H|), i)-constellation for some i (z Q. Therefore, by 
Lemma 13.81 there exists a revealing set of i-pompoms for w. Consequently, by Lemma 13.61 the 
theorem follows. □ 

4 Probabilistic formulas and test combinatorialization 

Here we take a non-adaptive 2-sided test and make its structure more malleable to combinatorial 
arguments, with the main feature being that the new query distribution will be uniform over its 
support. At first, we define a structure that can generally describe tests; we use this formulation 
to make the following arguments clearer and more succinct, which will also present them in their 
fullest possible generality. 

Definition 4.1 (probabilistic constraints and formulas). A probabilistic g-constraint (over an 
alphabet E) is a pair C = {Q,S) where Q C [n] is a constraint set, also called a query set, of 
size q, and S is a satisfaction function from to the real interval [0,1]. 

A probabilistic g-formula P = (T”,/r) is a set J- of q-constraints, all with distinct constraint 
sets, along with a probability distribution fi over T. We call it a {q,k)-formula if additionally 
I Supp(^)| < k, in which case we can assume that |T'| < k. 

When we drop the restriction on the sizes of the query sets of the constraints (even the 
restriction that they are all of the same size) then we call P a probabilistic formula. 

Given a word w £ and a probabilistic formula P, the satisfaction of P by w is the 
average of the random variable that results from picking a constraint {Q,S) £ T according to p, 
and obtaining the value S{wq). P is said to be 5-sure for w if its satisfaction by w is either at 
least 1 — J or at most 5. 

The requirement for all sets corresponding to constraints being distinct allows us (given a 
particular formula J-) to identify the distribution p with the corresponding distribution over 
subsets of [n] only. This we will do throughout the sequel, but first let us justify this requirement. 

Lemma 4.2. The requirement that the members of P have distinct query sets is without loss 
of generality. 

Proof. If Cl = {Q,Si) and C 2 = {Q,S 2 ) are two constraints in a formula P = {fF,p) (that for 
now does not satisfy the distinct set requirement), then we define F' by replacing them with 
C = {Q, S) where S = {p{Ci) ■ Si + p{C 2 ) ■ S 2 )/{p{Ci) + p{C 2 )), and define the corresponding p' 
by setting p'{C) = p{Ci) + p{C 2 ). This preserves satisfaction values over all words w. We can 
continue doing this until there are no pairs left of constraints sharing the same query set. □ 

We shall henceforth abuse notation, and indeed refer to p both as a distribution over 2["'l 
and as a distribution over F. Also, we shall make liberal use of the assumption (without loss of 
generality) that the support of p is the entire F (otherwise we replace it with the appropriate 
subset). 

A non-adaptive 2-sided test or partial test can be described as follows. 
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Definition 4.3 (alternative definition of non-adaptive tests). Given two properties L' C L C 
a non-adaptive 2-sided partial {e,6)-test for {L',L) is a probabilistie formula whose satis- 
faetion over any w £ L' is at least 1 — 5, while its satisfaetion for any w that is e-far from 
L is at most 5. 

If L' = L then we just call it a 2-sided {e,5)-test for L. 

If the test uses a q-formula then we may also eall it an {e, 5, q)-test, and if it uses a {q,k)- 
formula then we may eall it an {e, 5, q, k)-test. 

To convert a non-adaptive test to this definition, we take /x to be the query distribution 
corresponding to the test, and set each pair {Q, S) so that S will describe the acceptance 
probability of the test given each possible outcome of its queries to Q. 

We need the following technicality for pairs {L',L). It is safe to restrict our discussion to 
such pairs because otherwise there exists a corresponding trivial partial test. 

Definition 4.4. Given two properties L' <£ L C H", we say that the pair {L',L) is e-nontrivial 
if there exist some word in L' and some word e-far from L. 

The main purpose of this section is to show that all tests can be made to obey certain 
restrictions, at some reasonable cost for their parameters. To formulate the main lemma we 
need to define what these restrictions may be. 

Definition 4.5 (restrictions on formulas and tests). A probabilistic formula P is said to he 
zero-one if all its constraints have the range { 0 , 1 } (instead of the whole interval). 

P is said to he /3-equitable if for every two constraints Ci and C 2 in the support of the 
eorresponding distribution fi, we have pi{Ci) < /3;u(C'2). In particular, for a 1-equitable formula 
the distribution /x is uniform over its support. 

A q-formula P is said to be combinatorial if it is a zero-one and equitable. 

We use the same adjectives for tests. For example a test is ealled combinatorial if its 
eorresponding formula is combinatorial. 

We will prove the main combinatorialization lemma of this section following a sequence of 
steps. The easiest of these steps is making the corresponding formula zero-one. 

Lemma 4.6. A formula P can be made into a a zero-one formula P' without any ehange in its 
other parameters (including also support size and equitability), so that for any input for whieh 
P was 5-sure about, P' will be 26-sure about and in the same direction. 

Proof. For every constraint C = {Q,S) in Supp(;u), we replace it with C" = {Q,S'), where S' 
is defined such that S'{v) = 0 if S{v) < and otherwise S'{v) = 1. We leave p, “unmodified”, 
that is the new p,' is defined by having p.'{C') = lx{C), in particular remaining identical as a 
distribution over query sets. 

We present here the analysis for the case where the satisfaction of P by xc G H” is at most 
5. The case where it is at least 1 — <5 is symmetric. Given such a w, we set P = Supp(;u), and 
let P 2 be the set of clauses whose satisfaction by w is at least 1/2. Clearly ^(^ 2 ) < 2(5. The 
satisfaction of P' by w is now bounded by 0 • /x(T' \ P 2 ) + 1 • ^(-^ 2 ) < 2(5. □ 

In the sequel we will need to analyze formulas conditioned on subsets of the original con¬ 
straint set. 

Definition 4.7. Given a probabilistie formula P = (P,/u) and 0 7 ^ P' C P, the P'-conditioned 
formula is P' = {P', fi') where /x' is ja eonditioned on the event that a member from P' was 
ehosen. 
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The following fact about conditioned formulas is trivial. 

Observation 4.8. Given P = if P' P P satisfies fi{P') > r], then for every input for 

which P was 6-sure about, the conditioned formula P' will be b/rj-sure about and in the same 
direction. 

Proof. Again we analyze the case where the satisfaction of P by u) G H” is at most 5, as the 
case where it is at least 1 — (5 is symmetric. For such w we write: 

h'{Q)S{wQ) = h{Q)S{wq)/h{P') < 6/r] 

(Q,S)eT' iQ,S}eJ^' 

where in the symmetric case we refer to (1 — S{wq)) instead of S{wq). □ 

We next prove a lemma (which like most preceding ones holds also for formulas which are 
not tests), that allows us to move from /3-equitable formulas all the way to 1-equitable ones. 
For making the transition cost not too high, we prove first a “quantization” step. 

Lemma 4.9. A (3-equitable formula P = {P,fJ-) can be made into a formula P' = {P,fi') for 
which p' has at most log(2/3) possible values, so that for any input for which P was 6-sure about, 
P' will be 26-sure about and in the same direction. Moreover, since P' has the same P and 
the same support, it preserves the original support size, query size, and zero-one property (if it 
existed) of P. 

Proof. We first define fl by setting for every C ^ P the value fl{C) to be , where kc is the 
largest integer for which 2 “^^? > pi{C). Clearly for every C we have /r(C') < fi{C) < 2/r(C'), and 
clearly fi has at most log(2/3) possible values. However, it is not a probability measure, because 
it may be that fi{P) > 1. We thus set /u'(C') = jl{C)/jl{P) for every C G P. 

Finally, if the satisfaction of P by rc is at most 5, we write: 

^ p'{Q)S{wq)< Y fiiQ)S{wQ)<2-Y h{Q)S{wQ)<26 
{Q,S)eT iQ,S}eJ^ (Q,S)eJ^ 

where again the case of the satisfaction being at least 1 — 5 is symmetric. □ 

Lemma 4.10. A (3-equitable formula P = (P, can be made into a 1-equitable formula P' = 
{P',IJ.'), so that for any input for which P was 6-sure about, P' will be 26 log{2(3)-sure about 
and in the same direction. Moreover, P' C P, so P' preserves the support size bound, query 
size bound, and possible zero-one property of P. 

Proof. We first use Lemma 14.91 to move from P to P" = {P,fj,"), where p." has at most log(2/3) 
possible values, and any input for which P was (5-sure about, P" is 2(5-sure about. Now there 
must be some r] € (0,1] so that Pr^ = {C € P : p"{C) = p} satisfies > l/log(2/3). 

We set P' = Prj and make P' the formula of P" conditioned on P'. We finalize the proof 
by appealing to Observation 14.81 □ 

Specifically for (partial) tests, we next show some correlation between subsets “covering” 
few indexes and probability. The following lemma will also be used again when constructing 
sets of pompoms to prove the main conversion result from 2-sided tests to sampling tests. 

Lemma 4.11. If a formula P = (P,/i) corresponds to an {e/2,6)-test for {L',L) which is e- 
nontrivial, and P' C P is such that the union of its corresponding query sets occupies at most 
en/2 indexes from [n], then p{P') < 26. 
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Proof. Let T = IJ^g Q, u ^ he & word in L', w € he e-far from L, and v be such 
that vt = ut and = W[n]\T- By the triangle inequality, v is e/2-far from L, and so 

fi{Q)S{uQ)= Y. KQ)S{vq)<6 

{Q,S)£r' {Q,S)£T' 

since this bounds the satisfaction of P by n. 

On the other hand, the satisfaction of P by n is at least 1 — <5, and so we obtain 

l-5< ^ ix{Q)S{uq)+ Y ^^{Q)S{uQ)<5 + {l-^x{F')) 

{Q,s)£r {Q,s)£T\r 

which necessarily means that < 2(5. □ 

Using the above lemma we can show that tests with a small enough support size can be 
made into equitable ones. 

Lemma 4.12. For 6 < ^, an {e/2,6,q,an)-test for {L',L) which is e-nontrivial can he made 
into a fd-equitahle {e/2,25, q,an)-test for {L',L), for /3 = 8qa/e. This transformation also 
preserves the zero-one property if it existed. 

Proof. Let P = {iF,fi) be the formula corresponding to the test. First we let Pq = {C € 
P : yL{C) < l/4Q:n}. Clearly /x(Po) < j. Now let Pi = {C G P : /r(C') > 2q/en}. Clearly 
|Pi| < en/2q. Hence | (J^g 5 )gJ'i so by Lemma [4. Ill we have /i(Pi) <25 <\. 

Setting P' = P \ (Po UPi), we get n{F'') > Setting P' to be the conditioning of P to P', 
we obtain by Observation 14.81 that it is an {e/2,25,q,an)-test for {L',L). Moreover, since for 
every C € P' we have l/Aan < fJ.{C) < 2q/en, we get that the resulting test is /3-equitable for 
/3 = 8qa/e. □ 

We now have nearly all the ingredients we need. The hnal one is a is a way to convert a 
general test to one whose support size is linear in n, which the following lemma provides even 
for formulas that are not necessarily tests. 

Lemma 4.13. Any q-formula P = (P,/i) can be made into a {q, an)-formula P' for a = 
5“^log(|H|), with the condition that any w G H"', for which P was 5-sure about, P' will be 
25-sure about and in the same direction. This also preserves the zero-one property if it exists. 

Proof. To produce the new formula, we take r = <5“^ log(|H|) • n samples {Qi, Si),..., {Qr, Sr) 
from P by independently drawing each sample according to /r. For rc G we set = 
We also let rj = Y1{q s)eT h{Q)Si'^Q) denote the satisfaction of P by 
w. Let Yi denote the random variable Si{wQ^), and set Y = X^[=iUi/n. Note that E[yj] = 
^(g 5 )gjr ix{Q)S{wq) = ij. Thus also E[y] = r], and by Lemma [2l^ we have that the probability 

for \rju, — -ql > 5 IS bounded by 26“^^*^^ < 

Thus, with probability at least the obtained sequence is such that for all tc G E"" we have 
that the difference between and ry is at most 5. We fix such a sequence (Qi, ^i),..., {Qr, Sr). 
To dehne P' = (P', q'), we set P' to be the set of clauses appearing in (Qi, S*!),..., {Qr, Sr), 
where for C G T' we set p'{C) to be the number of times it appeared in the sequence, divided 
by r. □ 

Now we are finally ready to prove the main combinatorialization result of this section. 

Lemma 4.14 (combinatorialization lemma). Any (partial) {e/2,5, q)-test for {L',L) which is 
e-nontrivial can be made into a combinatorial {e/2,5',q,an)-test, where a = (5“^log(|E|) and 
5' = 16(51og(16(y(5“^ log(|E|)/e). 
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Proof. Setting P to be the formula corresponding to the test, we can assume that 5 < ^, as 
otherwise we can just ignore P and provide a “test” that is satisfied by all inputs. We perform 
the following sequence of steps. 

• Use Lemma 14.131 to make it into a formula Pi corresponding to an (e/2, 26, q, an)-test for 
a = 5-2iog(|s|). 

• Use Lemma (4.121 to make Pi into a formula P 2 that is an log(|H|)/e)-equitable 

(e/2,4(5, O', anj-test. This is the only step that requires the formula to correspond to a 
test. 

• Use Lemma fd. 101 to make P 2 into P 3 which is an (e/2, 86 log(16g(5“^ log(|H|)/e), g, cmj-test 
that is 1 -equitable. 

• Finally use Lemma 14.61 to make P 3 into the formula P', which is a combinatorial partial 
(e/2,16(5 log(16g(5“^ log(|H|)/e), g, cm)-test for {L',L). 

The final formula P' is the required test. □ 

Before concluding this section, we note that by analyzing the only place in the proof where 
we used that P is a test, we can formulate the following more general “combinatorialization 
lemma” which could be of independent use. 

Lemma 4.15 (general combinatorialization). If P is a probabilistic q-formula resolving with 6 
confidence, for which there are both “yes” instances, and words where every ej2-close word is 
a “no” instance, can be made into a combinatorial {q,an) formula resolving the same problem 
with 6' confidence, where a and 6' are as in Lemma \4-14\ 

To conclude this section, we combine Lemma l4.141 with an amplification technique to show 
how general 2-sided (e/2, (5, g)-tests can be converted to combinatorial tests. 

Lemma 4.16. A partial {e/2,6,q)-test for {L',L) which is e-nontrivial can he converted to 
a combinatorial partial (e/2, l/(4g')^, g', g'® log(|H|)(log(log(|H|)/e))^n)-test/or {L',L), where 
q' = C>(glog(g)loglog(log(|H|)/e)/(^ -^f)- 

Proof. We first use 2-sided amplification for the original test: We repeat the original test 
201og(g) loglog(log(|S|)/e)/(^ — (5)^ times and take the majority vote. This brings us to an 
(e/2, l/40g'^ log(log(|H|)/e),g')-test for q' = 0(glog(g) loglog(log(|H|)/e)/(^ — We use 

Lemma [4.141 and obtain from it a combinatorial (e/2, l/4g'^, g', g'® log(|H|)(log(log(|H|)/e))^n)- 
test as required for the lemma’s conclusion. □ 

We note that the dependency of q' above on loglog(log(|H|)/e) implies that a constant power 
of n is guaranteed only for properties for which the alphabet does not depend on n. However, it 
is unlikely to destroy sublinearity by itself even for a variable alphabet setting, because already 
for |H| =2” there are examples with no sublinear sampling tests at all by [9]. 

5 A conversion of a 2-sided test to a 2-sided sampling test 

Here we prove that if the properties L' (I L admit a 2-sided test with a constant number 
of queries for (L', L), then there is a corresponding 2-sided p-sampling test where p corresponds 
to a constant negative power of n. Specifically we prove the following. 
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Theorem 5.1. Let a = 10^1n(|H|) • g^/e. For every q > 3 and n > (24g'^°(log(|H|))^/e)'?, if 
{L',L) over E"^ admits a2-sided combinatorial (e/2, 1 / 4 ( 7 ^, g, g® log(|H|)(log(log(|H|)/e))^n)-iest 
and is e-nontrivial, then it also admits a p-sampling 2-sided (e, l/10)-test such that p = an~^l^ . 

As with the proof of the 1 -sided case, we set ^ to be a distribution of the test, and find for 
it pompoms that cover every possible assignment to a common core set C. Here however there 
are many pompoms involved, and they serve all assignments to C at once, because we need to 
cover enough of the “weight” of the distribution p. Also, the pompoms are not necessarily of 
witnesses, but rather of query sets; we will use them to approximate for every assignment of 
C the “amount” of query sets that would cause rejection (hence they need to cover sufficient 
weight). We need the test to be combinatorial, i.e., that p is uniform over its support, in exactly 
one place; The sampling test will approximate the number of rejecting query sets, and only for 
a uniform p will this correspond to the rejection probability of the original test. 

Let us formally define the set of pompoms that we will use. 

Definition 5.2. Given a distribution p over sets of size q, a set J of i-pompoms made from 
members of Snpp{p) is discerning for p if the following holds: 

1. p{X) > where X = UyveJ^ union of all the i-pompoms in J. 

2. Every i-pompom in J has cardinality exactly e ■ 

3. There exists C C [n] of size at most g^® log(|H|)(log(log(|H|)/e))^n^“*/'^ that is a core of 
all the i-pompoms in J. 

Next we define and show how each pompom of such a set can be used for the proof of 
something like Theorem 15.11 

Definition 5.3. Given a probabilistic q-formula P = {P,p) over E^, an i-pompom W C 
Supp{p) of sizee-n}~'^'^~^^l^ jifii) with core C of size at most g^® log(|H|)(log(log(|H|)/e))^n^“*/^, 
a word w G H"", a possible assignment a G to C, and a query set U 'X [n], the approximated 
satisfiability in W of cr with respect to w is defined to be the value '^a,u,w obtained in the 
following manner. 

Set Wu = {Q G W : Q\C C U} (i.e., take the set of members ofW whose indexes outside C 
are contained in U), and then take the average 'ya,u,w = (Z]{(q S)eJ'-QeW[/} ^iiwa,c)Q))/\y^u\, 
which we arbitrarily set to ^ ifWu = 0. 

An explanation to the above definition: Assume that 17 is a set of queries that we have 
made. We would like to assess the assignment a of C, with respect to what U tells us about 
w outside of C. Given the i-pompom W, we want to approximate the relative weight of the 
members of W for which the corresponding constraints accept Wa,c- We do so by restricting 
ourselves to the members of Wu, for which we can tell by querying U whether they accept 
Wa^c or not. We ignore all aspects of p apart from its support, because we will assume that 
it is uniform over Supp{p) (i.e., that the formula P corresponds to a combinatorial test). This 
assumption is essential to show that a set U chosen according to a sampling distribution will 
indeed yield with high probability a good approximation. 

Note that 7 o-,[n],w is the true acceptance average of the pompom W. We now prove that 
the sampling distribution with high probability provides a U such that '^cr,u,w approximates 

'3a,[n],'W- 

Lemma 5.4. Let q> 3, n> (24g^®(log(|H|))^/e)'^, a = 10^1n(|H|) ■ g^/e and w G E^. Suppose 
that the formula P = {P,p), the i-pompom W and its core C, and the words w and a are 
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as per the requirements of Definition \5.iA and additionally that P is combinatorial. Then with 
probability at least 1 — a set U drawn according to the -sampling distribution 

satisfies \^a,U,'VV 7(T,[n],wl — 10 ' 


Proof. Let us first analyze which members of W get into Wu- Since {Q \ C : Q & W} is a 
family of disjoint sets of size i, the choice of U means that every Q € W becomes a member of 
Wif with probability exactly a* , independently of other members of W. We now refer to 

LemmaEH (where ( 71 ,, 7 ^) there are the satisfaction values S{{wa,c)Q) for {Q, S) € P such 
that Q G W), which implies that the probability of having \ ja,u,w — To-jnj.wl > fo®® than 
g-io-3a*.n-V'jLeni-(»-i)/V(3i). Calculation [5] bounds it by < 

1001^1 ■ 


From the above lemma we formulate a way of approximating all pompoms in a discerning 
set J, assuming that we have knowledge of J, the common core set C, and of course the original 
combinatorial test {P, p). 

Lemma 5.5. Assume that q > 3, n > (24q^‘’(log(|H|))^/e)'^ and w € H". Let P = {P,p) be a 
combinatorial (e/2, l/4g^, ( 7 , ( 7 ® log(|H|)(log(log(|H|)/e))^n)-test for {L',L) which is e-nontrivial, 
let J be a discerning set of i-pompoms for it with core C, and let U be chosen by the a ■ n~^l^ - 
sampling distribution where a = 10^ ln(|H|) -g^/e. With probability at least for every a G 
it holds that Ijji X^wgJ 'y^,u,y^ ~ jjj XyWeJ 7(T,[n],wl — 5 • 

Proof. For a hxed a G by using Lemma 15.41 and Markov’s inequality, we obtain that with 
probability at most we have more than J| instances W G J for which \ja,u,w — 

7o-,[n],wl > 1 ^- Therefore, with probability at least (noting that every 7 value is always 

between 0 and 1 ) the following holds: 


^ 2^ 7<7.t/,W -TP 2 ^ 7<x,[n],wl ^TP 2 ^ - Ta,[n],w\ < + JF = 7 

' ' weJ ' ' weJ ' ' wgJ 


A union bound over the bad events for every possible cr G concludes the proof. □ 

We now show that a constellation as defined in Definition 13.71 implies a discerning set of 
pompoms, just as for the 1-sided case we used it to find a revealing set of pompoms. Later we 
will use Definition 13.91 and Lemma [3.13l to find the required constellation, just as we did for the 
case of 1 -sided tests. 


Lemma 5.6. Let i G [g] and n > (24(7^°(log(|H|))^/e)'?, {L',L) be e-nontrivial, P = {P,p) be 
a combinatorial (e/2, l/4g^, q, log(|H|)(log(log(|H|)/e)))^n)-test/or {L',L), and let {C,S) be 
a {q^^ log{\E\){log{log{\E\)/e))‘^,i)-constellation for p. Then there exists a set J of i-pompoms 
that is discerning for P with core C. 

Proof. We extract pompoms from S greedily. We claim that as long as p{S) > we can 

extract an i-pompom W of size e • n^“*'*“^^/'^/(3i) with center C from S, which we then subtract 
from S and make into a new member of J. Assuming the claim holds, the process stops only 
when J becomes such that Item [1] of Dehnition holds, because we started with a set S of 
weight at least Also, Item [3] of Definition 15.21 follows from Condition [1] of Definition 13.71 
(regarding S and C), while Item [2] follows from the construction described above. 

It thus remains to show the following claim: Given a set S' P S for which p{S') > 

where (<5,(7) is a (g^° log(|H|)(log(log(|H|)/e))^, z)-constellation for p, an i-pompom W C 5fof 
size e • n^“(*“^)/'^/(3i) with center (7 exists. 
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Since /x(5') > 2 (^q:T)’ Lemma [4.111 we have that |Uqg 5 'QI ^ ere/2. Defining B' = 
{Q \ C ■. Q ^ 5'}, we observe that lUgeB'QI ^ IUqgs'QI “ 1^*1 > ere/3, because re > 
(24g^°(log(|H|))^/e)'^ and so |C| < ere/ 6 . Thus, by Condition HI of Definition ITTI there exist 
a disjoint family V C of e • re^“(*“^)/'?/(3i) sets. Thus the family W = {Q : Q \ C G V} is the 
required i-pompom for the claim. □ 

Now we prove the main result of this section. 

Proof of Theorem A5.1[ Given a combinatorial (e/2, q, q^ log(|H|)(log(log(|H|)/e)))^re)-test 

for (L',L), where re > (24(7^°(log(|H|))^/e)'^, we construct for a = 10^1n(|H|) • q'^/e a 2-sided 
(e, ^)-test for (L', L) that uses the a ■ n~^l^ -sampling distribution. 

First we construct for the distribution of the combinatorial test the families A4j, Ci and 
Si as per Definition 13.91 Similarly to the proof of Theorem 13.11 every set in <So is in particular 
a subset of Cq, and so the union of all the sets in Sq is also a subset of Cq. Therefore, by 
Item [T] of Observation 13.101 lUgeSo^l — ^ log(l^l)(log(log(|H|)/e))^re^“^/'? < ere/ 2 , 

where the last inequality follows from re > (24(7^^(log(|H|))^/e)'?. Therefore, by Lemma 14.111 
/^(•^o) < l/{q+l), so by Lemma[3T3]there exists a log(|H|)(log(log(|H|)/e))^, i)-constellation 
for some i & Q. 

Moreover, we note that such a constellation can indeed be computed from only the knowledge 
of Supp(^). We set (S,C) to be such a constellation, and then use Lemma [5l6] (which is also 
constructive) to obtain the discerning set J of z-pompoms. 

The test proceeds as follows. Given the set 17 produced by the a ■ -sampling distribu¬ 

tion, we query all of it. Then, for every a G and every W ^ J, calculate 7o-,[/,w using 
our queries, and then calculate 7 o-,f/ = X^vveJ every a. If there was a cr G for 

which 7 o-,[/ > then we accept the input, and otherwise we reject it. 

It remains to prove that this is indeed a correct test for (L',L). Set Z = UweJ ^ P®'- 

Definition 15.21 Since /r(T) > if ^ is any word for which the original test was l/4g^-sure 

about, then the conditioning of the test to the set of constraints corresponding to the members 
of Z will be ^-sure for u by Observation 14.81 Now, since /z is uniform over its support, for any 
a G the satisfaction of the original test conditioned on Z by Wa,c is identical to the average 
7(T,[n] = X^vvgJ^ o-,[n],w- turn. Lemma E3] guarantees that with probability at least for 
all such cr we have \'ya,u ~lcr,{n] \ ^ Assume from now on that this event has indeed occurred. 

If w was a word in L', then the original test accepted it with probability at least 1 — l/4g^, 
and hence for a = wc (for which Wafi = w) we have 7 o-,[n] > ^ and hence ^^,11 > and the 
sampling test will accept on account of this cr. 

On the other hand, if w was a word e-far from L, then for every cr G the word zCo-,c 
is e/2-far from L (recall that in particular \C\ < enjZ)^ and so the original test will accept it 
with probability at most l/4g^. Hence 7 o-,[n] < for every such cr, and hence 7 ^, ,(7 < ^. This 
means that the sampling test will reject, as there will be no cr on whose account the test can 
accept. □ 

6 Implications of our results 

The following corollaries result respectively from Theorem 13.11 and Theorem 15.11 considering 
that a multitest scheme (as described in the introduction) immediately leads to a test for a 
union of the properties. 

Corollary 6.1. Let q = 30g'log(g')/(l — 5) and oi = log(|H|)g^/e. For every re > (24g(g -|- 
l)^(log(|H|))^/e)'^, if L C- rZ is the union of r properties Li,--- ,Lr eac/i having 1-sided 
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{e/2,6,q')-tests where r < then L has a non-adaptive 1-sided {e,l/2)-test 

with query complexity 0{n^~'^). 

Proof. First, we use Lemma [2. 101 to convert the {e/2,6,q')-test for every Lj to a non-adaptive 
1-sided {e/2,l/2{q + l),q,A{q -P 1)^ log(|S|)n)-test where q = 30(jr'log((jr')/(l — 6). We then 
use Theorem o to obtain an ain ^/'^^-sampling 1-sided (e/2, l/2)-test for every Lj. We then 
amplify the probability, by repeating the test and rejecting if any of the runs rejects, to obtain 
a log(2r)ain“^/'^ -sampling 1-sided (e/2, l/2r)-test for every Lj. We construct a multitest for 
Li,..., Lj., which reuses the same queries for each sample-based test, and from it derive the test 
for Since r is at most 2(“i) g^ygg non-adaptive 1-sided (e, l/2)-test 

with query complexity □ 

Corollary 6.2. Let q = 20q' log(g') loglog(log(|H|/e))/(^ — 5)^ and let = 10^ ln(|S|) ■ ^^/e. 
For every n > (24g'^°(log(|H|))^/e)'^, if a property L CE"- is the union ofr properties Li, • • • , Lr 

each having a 2-sided {e/2,5,q')-test where r < ^ a non-adaptive 2- 

sided (e, l/10)-test with query complexity 0(n^“'^). 

Proof. First, use Lemma r4.16l to convert the (e/2, 6, g')-test for each Lj to a combinatorial 2-sided 
(e/2, l/(4g)^,g,g® log(|H|)(log(log(|H|)/e))2n)-test, where g = 20g'log(g') loglog(log(|H|)/e)/(i- 
(5)^). Then we use Theorem 15.11 to convert each of these tests to an a 2 n~^^'^ -sampling 2-sided 
(e, 1/10) test. Now, we convert them to 10 log(r)a 2 ?^~^^'^ -sampling 2-sided (e, l/10r)-tests by 
repeating each test 10 log r times independently and taking the majority vote. We construct a 
multitest for Li,..., L^, which reuses the same queries for each sample-based test, and from it 
derive the test for Since r is at most ^ gives the 2-sided (e, 1/10)- 

test with query complexity 0(n^~'>'). □ 

Definition 6.3 (following Definition 2.1 of [H] ). A Merlin-Arthur proof of proximity (MAV) 
for a property L C H"', with proximity parameter e, query complexity q and proof complexity p, 
consists of a probabilistic algorithm V, called the verifier, that is given a proof string ir G ; 
in addition, it is given oracle access to a word w G E"^, to which it is allowed to make up to q 
queries. The verifier satisfies the following two conditions: 

1. Completeness; For every w G L, there exists a string vr G (referred to as a proof or 
witness/ such that Pi\y{n,e,w) = 1] > 2/3. 

2. Soundness; For every w G 'FF which is e-far from L, and any vr G F^, Pr[14(n, e, w) = 
1] < 1/3. 

If the completeness condition holds with probability 1, then we say that the A4AV has 1-sided 
error, and otherwise we say that it has 2-sided error. Also, we may say that it is non-adaptive 
if it makes its queries to w based only on n, before receiving any responses from w. 

For our purposes, we note that the proof of a A4AV scheme for a property L induces a 
decomposition of L into sets whose union is L, each admitting a corresponding partial testing 
algorithm. Specifically, for every w G L we define 11^ to be any non-empty subset of the set of 
proofs vr G that make the verifier accept w with the required probability. Then, for every 
TT G HP we set Ljr = {tc G L : TT G 11^} (it may be the case that some L^r are empty). 

Under this interpretation, for a word in the property, the proof vr is simply an indicator 
that the word belongs to Lj^. Thus, the verifier of the AiAV scheme can be seen as receiving 
as input a proof vr and then running a partial test for (L^^jL). Consequently, the existence of 
a MAP scheme with query complexity g and proof complexity p for a property L is the same 
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as having a family of properties {L^r : vr € such that L = Uttssp there exists a 

partial test for every pair (L^,L). 

Similarly to Corollary 16.21 only using the validity of Theorem 15.11 for partial tests as well, 
we obtain: 

Corollary 6.4. Let q = 20q' log((?') loglog(log(|H|/e))/(^ — 5)^ and let 02 = 10^ ln(|H|) • q^/e. 
For every n > (24g^*^(log(|H|))^/e)'^, if a property L C H"' has a non-adaptive 2-sided (e/2,1/10)- 
test with query complexity of then every 2-sided M.AV scheme for L, that has query 

complexity q', has proof complexity “'^/10a2)- 

Although Theorem 13.II was stated and proved for (non-partial) 1-sided tests only, it can also 
be made to work for partial tests, and to give a corollary with an improved bound for this case. 

Corollary 6.5. Let q = 30g''log(g'')/(l — 5) and let ai = log(|H|)g'^/e. For every n > {2Aq{q + 
l)^(log(|H|))^/e)'^, if a property L CrF has a non-adaptive 1-sided (e/2,1/2) query complexity 
of n(n^“'^), then every 1-sided MAP scheme for L, that has query complexity q', has proof 
complexity Fl{n^ foti — 1). 

We note some concrete applications of the above results. 

• In [13] . it was shown that there exists a language LS with logarithmic space complex¬ 
ity that satisfies the following: every non-adaptive 2-sided (e/2, (5)-test for LS n {0,1}” 
has query complexity Ll{n). By Corollary 16.21 this means that for every large enough 

n, L is not the union of less than properties over L* C rP each having 

a 2-sided (e/2,5,g')-test where q = 20(7'log((7') loglog(log(|H|/e))/(^ — 6)“^ . By Corol¬ 
lary [631 LS does not have a 2-sided MAV with query complexity q' and proof complex- 
_2 

ity o{n^ /lQa 2 )- Similarly, such conclusions apply to the properties of the small CNF 
formula that were studied in [T]. 

• Our result also applies to the sparse graph property of 3-colorability, which in [3] is shown 
to have a linear 2-sided test query complexity. Note that in the sparse graph model the size 
of the alphabet H is n, but this is still small enough for our results to provide non-trivial 
conclusions against decomposability. 

• According to the results in m and [5] respectively, every property defined by a constant 
width read-once branching program or a constant arity read-once Boolean formula is 
testable. Hence our results imply that properties whose testing requires 0(n^“'’') many 
queries cannot be written as the union of a few properties that have such representations. 
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A Calculations 

This appendix is reserved for calculations that are too long and bothersome to be put where 
they are originally used. 

Calculation 1. For an alphabet S, positive integers q and n > (24g(g-|-l)^(log(|S|))^/e)'^, any 
D < e < 1, i ^ [q], and a = 15In |S| • q{q -|- l)^/e, we write: 




If i > 2 then we can clearly bound this so: 


-a^-n ®/'J^'(l/3i)e-nl ^ g— In log(|S|)nl ^ _ |'^|—4(j'(5-|-l)^ log(|S|)nl 


< 


For i = 1, we use the lower bound on n to show > log(|S|)n^ and so: 


^-a-n ^ g—In |S|-5g(5-|-l)^ log(|S|)n^ ^ ^ ^ I’^i—4(j'(5-|-l)^ log(|S|)n^ 
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Calculation 2. For n > (24g^‘’(log(|S|))^/e)'^, a = 10^1n(|H|) • q^je, q >3 and i G [g], for the 
case i > 3 we write: 


^- 10 '^ 


. 1 . 

3i 


< 


-ln{|E:|)3ql2.e 2. 


3q 


< 


1 

Too 


log(|E:|)(log(log(|E:|/e)))2nl 


Fori = 1, we use > (24g^°(log(|S|))^/e)^“^/'^ > 8g®(log(|S|))^/^/e^/^, and fori = 2 

we use > (24g^°(log(|S|))^/e)^“^/'^ > 2g^(log(|H|))^/^. In both cases we substitute the 

value of a® and write: 


o 


_ — ^ |’^|—log(|S|)(log(log(|S|/£)))2n^ */'? 

‘ - 100 
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